Is Grandma’s old lucky coin fair?
How could we find out?
we flipped Grandma’s coin \(n=24\) times
we observed it land heads \(k = 7\) times
What now?
we want to infer latent variables, which are not directly observable, from observable data (e.g., a coin’s bias, (properties of) mental processes etc.)
we often have a clear idea of how a vector of latent variables \(\theta\) makes each possible data observation \(D\) more or less likely, i.e., a likelihood function \(P(D \mid \theta)\)
think of the likelihood function as our theory of the data-generating process
we use the likelihood function to reason “backwards” from data to latent variables
the binomial distribution gives the probability of observing \(k\) successes in \(n\) coin flips with a bias of \(\theta\):
\[ P(k \mid \theta; n) = \binom{n}{k} \theta^{k} \, (1-\theta)^{n-k} \]
we fix a null hypothesis (e.g., the coin is perfectly fair: \(\theta = 0.5\))
the \(p\)-value gives the probability, under the null hypothesis, of an outcome at least as unlikely as the actual outcome [roughly put]
we fix a significance level, e.g.: \(0.05\) (to determine expected \(\alpha\)-error of falsely rejecting NH)
we speak of a significant test result iff the \(p\)-value is below the pre-determined significance level
we conventionally reject the null hypothesis iff test result is significant
\(p\)-values are not to be confused with:
the probability that the null hypothesis is true
the degree of confidence that the true parameter is, say, \(\theta = 0.5\)
the conditional probability of \(X\) given \(Y\) is defined (if \(P(Y) \neq 0\)) as:
\[ P(X \mid Y) = \frac{P(X \cap Y)}{P(Y)} \]
Bayes rule derives \(P(X \mid Y)\) from \(P(Y \mid X)\):
\[ \begin{align*} P(X \mid Y)\ = \frac{P(Y \mid X) \cdot P(X)}{P(Y)} \end{align*} \]
version for data analysis:
\[\underbrace{P(\theta \, | \, D)}_{posterior} \propto \underbrace{P(D \, | \, \theta)}_{likelihood} \ \underbrace{P(\theta)}_{prior}\]
model likelihood \(P(D \, | \, \theta)\):
## t=0 t=1/3 t=1/2 t=2/3 t=1
## succ 0 0.33 0.5 0.67 1
## fail 1 0.67 0.5 0.33 0
weighing in \(P(\theta)\):
## t=0 t=1/3 t=1/2 t=2/3 t=1
## succ 0.0 0.066 0.1 0.134 0.2
## fail 0.2 0.134 0.1 0.066 0.0
posterior \(P(\theta \, | \, \text{heads})\) after one success:
## t=0 t=1/3 t=1/2 t=2/3 t=1
## 0.000 0.132 0.200 0.268 0.400
likelihood
\[ P(k \mid \theta; n) = \binom{n}{k} \theta^{k} \, (1-\theta)^{n-k} \]
prior
\[ \theta \sim \text{Uniform}(1,1) \]
posterior
\[ P(\theta \mid k, n) = \frac{P(\theta) \ P(k \mid \theta; n)}{P(D)} \]
the 95% highest density interval (HDI) is a subset \(Y\) of parameter values with \(P(Y) = .95\) such that no point outside of \(Y\) is more likely than any point within
parameter estimation:
\[\underbrace{P(\theta \, | \, D)}_{posterior} \propto \underbrace{P(\theta)}_{prior} \ \underbrace{P(D \, | \, \theta)}_{likelihood}\]
model comparison
\[\underbrace{\frac{P(M_1 \mid D)}{P(M_2 \mid D)}}_{\text{posterior odds}} = \underbrace{\frac{P(D \mid M_1)}{P(D \mid M_2)}}_{\text{Bayes factor}} \ \underbrace{\frac{P(M_1)}{P(M_2)}}_{\text{prior odds}}\]
prior predictive
\[ P(D) = \int P(\theta) \ P(D \mid \theta) \ \text{d}\theta \]
posterior predictive
\[ P(D \mid D') = \int P(\theta \mid D') \ P(D \mid \theta) \ \text{d}\theta \]
| estimation | comparison | criticism | |
|---|---|---|---|
| goal | which \(\theta\), given \(M\) & \(D\)? | which better: \(M_0\) or \(M_1\)? | \(M\) good model of \(D\)? |
| methods | Bayes rule | Bayes factor, cross-validation | \(p\)-values, PPCs |
| computational tools | MCMC, variational Bayes | Savage-Dickey, bridge sampling | MC sampling |
pro
con